# Type your R code hereWelcome
- Learn to use R to calculate a 1-sample t-test
- Apply the steps for hypothesis testing from lectures
- Learn how to interpret statistical output
Before you begin
You can download the data
- From module 5 in Canvas
- ENVX1002_Data5.xlsx if you are viewing the html file from Github https://Github.com/envx-resources
Create a new project
Reminder (skip to step 2 if you are going to use the directory you created in your tutorial)
Step 1: Create a new project file for the practical put in your ENVX1002 Folder. File > New Project > New Directory > New Project.
Step 2: Download the data files from canvas or using above link and copy into your project directory.
I recommend that you make a data folder in your project directory to keep things tidy! If you make a data folder in your project directory you will need to indicate this path before the file name.
Step 3: Open a new Quarto file.
i.e. File > New File > Quarto Document and save it immediately i.e. File > Save.
Problems with your personal computer and R
NOTE: If you are having problems with R on your personal computer that cannot easily be solved by a demonstrator, please use the Lab PCs.
Installing packages
Remember All of the functions and data sets in R are organised into packages. There are the standard (or base) packages which are part of the source code - the functions and data sets that make up these packages are automatically available when R is opened. There are also many contributed packages. These have been written by many different authors, often to implement methods that are not available in the base packages. If you are unable to find a method in the base packages, you might be able to find it in a contributed package. The Comprehensive R Archive Network (CRAN) site (http://cran.r-project.org/) is where many contributed packages can be downloaded. Click on packages on the left hand side. We will download two packages in this class using the install.packages command and we then load the package into R using the library command.
Alternatively, in RStudio click on the Packages tab > Install > type in package name > click install.
Exercise 1: 1-sample t-test Milk Yield - Walk through
This exercise will walk you through how to test a hypothesis, check assumptions and eventually draw a conclusion on your initial hypothesis. 100 cows have their milk yield measured. Suppose we wish to test whether these milk yields (units unknown) differ significantly from the economic threshold of 11 units. (The units may possibly be litres of milk produced on a particular day).
The average Australian drinks about 100 litres of milk per year. The average cow produces between 12 and 30 litres of milk per day.
The data is in the Milk sheet found in the ENVX1002_Data5.xlsx file. You will follow the steps as outlined in the lectures:
- Choose level of significance (α)
- Write null and alternate hypotheses
- Check assumptions (normal)
- Calculate test statistic
- Obtain P-value or critical value
- Make statistical conclusion
- Write a scientific (biological) conclusion
Remember you can remember the above using HATPC
Lets go:
1. Normally you choose 0.05 as a level of significance:
This value is generally accepted in the scientific community and is also linked to type 2 errors where choosing a lower significance increases the likelihood of a type 2 error occurring.
2. Write null and alternative hypotheses:
Question: Write down the null hypothesis and alternative hypotheses:
H0: < Type your answer here >
H1: < Type your answer here >
Solution
Question: Write down the null hypothesis and alternative hypotheses:
H0: \(\mu_{yield}\) = 11 units
H1: \(\mu_{yield}\) \(\neq\) 11 units
3. Check assumptions (normality):
a. load data:
Make sure you set your working directory first
Solution
library(readxl)
milk <- read_excel("data/ENVX1002_Data5.xlsx", sheet = "Milk")It is always good practice to look at the data first to make sure you have the correct data, it loaded in correctly and know what the names of the columns are. This can be done by typing the name of the data Milk or for large datasets, use str() to show the first 6 lines:
# Type your R code hereSolution
str(milk)tibble [100 × 1] (S3: tbl_df/tbl/data.frame)
$ Yield: num [1:100] 18.5 15.9 13.1 15.1 5.7 9.4 15.3 17.6 18.4 22 ...
b. Tests for normality:
qqplots:
# Type your R code hereSolution
#Load library
library(ggplot2)
ggplot(milk, aes(sample = Yield)) +
stat_qq() +
stat_qq_line()Histogram and boxplots:
# Type your R code hereSolution
#Histogram
ggplot(milk, aes(x = Yield)) +
geom_histogram(binwidth = 1, fill = "lightblue", color = "black") +
labs(title = "Histogram of Milk Yield", x = "Yield", y = "Frequency")#Boxplot
ggplot(milk, aes(x = Yield)) +
geom_boxplot(fill = "lightblue", color = "black")Question: Do the plots indicate the data are normally distributed?
Answer: < Type your answer here >
Solution
Question: Do the plots indicate the data are normally distributed?
Answer: yes - think about why?
Shapiro-Wilk test of normality:
# Type your R code hereSolution
shapiro.test(milk$Yield)
Shapiro-Wilk normality test
data: milk$Yield
W = 0.98967, p-value = 0.6379
Question: Does the Shapiro-Wilk test indicate the data are normally distributed? Explain your answer.
Answer: < Type your answer here >
Solution
Question: Does the Shapiro-Wilk test indicate the data are normally distributed? Explain your answer.
Answer: yes. p-value > 0.05.
4. Calculate the test statistic
In R we achieve this via the command t.test(milk$Yield, mu = …) The R output first gives us the calculated t value, the degrees of freedom, and then the p-value, it then provides the 95% CI and the mean of the sample. Were mu = … is written enter in the hypothesised mean.
# write your R code hereSolution
t.test(milk$Yield, mu = 11)
One Sample t-test
data: milk$Yield
t = 4.9291, df = 99, p-value = 3.323e-06
alternative hypothesis: true mean is not equal to 11
95 percent confidence interval:
12.53485 14.60315
sample estimates:
mean of x
13.569
5. Obtain P-value or critical value
Question: Does the hypothesised economic threshold lie within the confidence intervals?
Answer: < Type your answer here >
Solution
Question: Does the hypothesised economic threshold lie within the confidence intervals?
Answer: No
6. Make statistical conclusion
Question:: Based on the P-value, do we accept or reject the null hypothesis?
Answer: < Type your answer here >
Solution
Question:: Based on the P-value, do we accept or reject the null hypothesis?
Answer: Reject the null hypothesis
7. Write a scientific (biological) conclusion
Question:: Now write a scientific (biological) conclusion based on the outcome in 6.
Answer: < Type your answer here >
Solution
Question:: Now write a scientific (biological) conclusion based on the outcome in 6.
Answer: The milk yields differ significantly from the economic threshold of 11 units. In fact, the cows tested yield an average of 13.6 units (95% CI: 12.5, 14.6), which is significantly higher than the economic threshold of 11 units.
Exercise 2: Stinging trees (individual or in pairs)
Data file: Stinging.csv
A forest ecologist, studying regeneration of rainforest communities in gaps caused by large trees falling during storms, read that stinging tree, Dendrocnide excelsa, seedlings will grow 1.5m/year in direct sunlight such as gaps. In the gaps in her study plot, she identified 9 specimens of this species and measure them in 1998 and again 1 year later.
Does her data support the published contention that seedlings of this species will average 1.5m of growth per year in direct sunlight? Also, calculate a 95% CI for the true mean. Analyse the data in R. Due to the small sample size we have to assume the data is normal.
It was found that researchers wearing welding gloves and a full body suit were still stung by the tree. The sting is extremely painful and can last for months. The pain is caused by a neurotoxin that is injected into the skin. The tree is found in the rainforests of north-eastern Australia.
Work through the steps below individually or in pairs. Add more code chunks if required (click insert -> R on above toolbar)
- Choose level of significance (α)
Answer:
Solution
- Choose level of significance (α)
Answer: 0.05 is generally accepted in the scientific community.
- Write null and alternate hypotheses
H0:
H1:
Solution
- Write null and alternate hypotheses
H0: \(\mu_{growth}\) = 1.5m/year
H1: \(\mu_{growth}\) \(\neq\) 1.5m/year
- Check assumptions (normal)
Read in the data:
library(readxl)
sting <- read_excel("data/ENVX1002_Data5.xlsx", sheet = "Stinging")
sting# A tibble: 9 × 1
Stinging
<dbl>
1 1.9
2 2.5
3 1.6
4 2
5 1.5
6 2.7
7 1.9
8 1
9 2
Plot your data:
# Type your R code hereSolution
#qq plot
ggplot(sting, aes(sample = Stinging)) +
stat_qq() +
stat_qq_line()#histogram
ggplot(sting, aes(x = Stinging)) +
geom_histogram(binwidth = 1, fill = "lightgreen", color = "black") +
labs(title = "Histogram of Stinging Tree Growth", x = "Growth (m)", y = "Frequency")#Boxplot
ggplot(sting, aes(x = Stinging)) +
geom_boxplot(fill = "lightgreen", color = "black") +
labs(title = "Boxplot of Stinging Tree Growth", x = "Growth (m)", y = "Frequency")Normality tests:
# Type your R code hereSolution
shapiro.test(sting$Stinging)
Shapiro-Wilk normality test
data: sting$Stinging
W = 0.96096, p-value = 0.8083
Question: Are data are normally distributed? Explain your answer.
Answer: < Type your answer here >
Solution
Question: Are data are normally distributed? Explain your answer.
Answer: Yes. Both the plots and Shapiro-Wilk test suggest the data is normal (p-value > 0.05).
- Calculate test statistic and
- Obtain P-value or critical value
# Type your R code hereSolution
t.test(sting$Stinging, mu = 1.5)
One Sample t-test
data: sting$Stinging
t = 2.3534, df = 8, p-value = 0.04643
alternative hypothesis: true mean is not equal to 1.5
95 percent confidence interval:
1.508055 2.291945
sample estimates:
mean of x
1.9
- Make statistical conclusion
Answer:
Solution
- Make statistical conclusion
Answer: P < 0.05 so we reject the null hypothesis \(\mu_{growth}\) = 1.5m/year
- Write a scientific (biological) conclusion
Answer:
Solution
- Write a scientific (biological) conclusion
Answer: The growth rate of the stinging tree, Dendrocnide excelsa is not equal to 1.5m/year. The mean growth rate is 1.9 m/year (95% CI: 1.51, 2.29), so the growth rate is faster than the previous study.
Check you answers with teaching staff
Thanks!
Bonus take home exercices
For each of these exercises, follow the steps outlined in the lectures (and this lab!) to test your hypotheses:
- Choose level of significance (α)
- Write null and alternate hypotheses
- Check assumptions (normal)
- Calculate test statistic
- Obtain P-value or critical value
- Make statistical conclusion
- Write a scientific (biological) conclusion
Exercise 1: Carrots
A farmer is growing carrots for a restaurant. The restaraunt wants their carrots to be 10 cm long, so the farmer wants to check if the carrots in their field differ significantly from the needed length.
#Read in data
carrots <- c(7, 7, 13, 5, 13, 10, 11, 12, 10, 9)Solution
Choose level of significance (α) > Answer: 0.05 is generally accepted in the scientific community.
Write null and alternate hypotheses
H0: \(\mu_{carrot}\) = 10cm
H1: \(\mu_{carrot}\) \(\neq\) 10 cm
- Check assumptions (normal)
#boxplot
boxplot(carrots)#histogram
hist(carrots)#shapiro test
shapiro.test(carrots)
Shapiro-Wilk normality test
data: carrots
W = 0.93961, p-value = 0.5486
The data are normally distributed
- Calculate test statistic and
- Obtain P-value or critical value
#t test
t.test(carrots, mu = 10)
One Sample t-test
data: carrots
t = -0.35006, df = 9, p-value = 0.7343
alternative hypothesis: true mean is not equal to 10
95 percent confidence interval:
7.761337 11.638663
sample estimates:
mean of x
9.7
- Make statistical conclusion
p > 0.05, so we retain the. null hypothesis
- Write a scientific (biological) conclusion
The carrot length is not equal to 10 cm. The farmer’s carrots have a mean of 9.7 cm, so they are smaller than the needed length
Exercise 2: Penguins
Rey has just landed on earth and notived that penguins look really similar to porgs. Using weight as the point of comparison, she wants to know if two different penguin species weigh the same as her pet Porg Stevie, who weighs 4000g.
We will be using the Palmer penguin dataset to test if chinstrap and gentoo penguins weigh the same as Stevie.
#install.packages("palmerpenguins")
library(palmerpenguins)2.1 Chinstrap
chinstrap <- penguins%>%
filter(species == "Chinstrap")%>%
na.omit()Solution
Choose level of significance (α) > Answer: 0.05 is generally accepted in the scientific community.
Write null and alternate hypotheses
H0: \(\mu_{chinstrap}\) = 4000g
H1: \(\mu_{chinstrap}\) \(\neq\) 4000g
- Check assumptions (normal)
#Load library
library(tidyverse)
#qqplot
ggplot(chinstrap, aes(sample = body_mass_g))+
geom_qq()+
geom_qq_line()#boxplot
ggplot(chinstrap, aes(x = body_mass_g))+
geom_boxplot()#histogram
ggplot(chinstrap, aes(x = body_mass_g))+
geom_histogram()`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#shapiro test
shapiro.test(chinstrap$body_mass_g)
Shapiro-Wilk normality test
data: chinstrap$body_mass_g
W = 0.98449, p-value = 0.5605
The data are normally distributed
- Calculate test statistic and
- Obtain P-value or critical value
#t test
t.test(chinstrap$body_mass_g, mu = 4000)
One Sample t-test
data: chinstrap$body_mass_g
t = -5.7268, df = 67, p-value = 2.631e-07
alternative hypothesis: true mean is not equal to 4000
95 percent confidence interval:
3640.059 3826.117
sample estimates:
mean of x
3733.088
- Make statistical conclusion
p < 0.05, so we reject null hypothesis
- Write a scientific (biological) conclusion
Chinstrap penguins do not weigh the same as Stevie. On average, chinstrap penguins weigh 3733.088g, so they are lighter.
2.2 Gentoo
gentoo <-penguins%>%
filter(species == "Gentoo")%>%
na.omit() Solution
Choose level of significance (α) > Answer: 0.05 is generally accepted in the scientific community.
Write null and alternate hypotheses
H0: \(\mu_{gentoo}\) = 4000g
H1: \(\mu_{gentoo}\) \(\neq\) 4000g
- Check assumptions (normal)
#qqplot
ggplot(gentoo, aes(sample = body_mass_g))+
geom_qq()+
geom_qq_line()#boxplot
ggplot(gentoo, aes(x = body_mass_g))+
geom_boxplot()#histogram
ggplot(gentoo, aes(x = body_mass_g))+
geom_histogram()`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#shapiro test
shapiro.test(gentoo$body_mass_g)
Shapiro-Wilk normality test
data: gentoo$body_mass_g
W = 0.98606, p-value = 0.2605
The data are normally distributed
- Calculate test statistic and
- Obtain P-value or critical value
#t test
t.test(gentoo$body_mass_g, mu = 4000)
One Sample t-test
data: gentoo$body_mass_g
t = 23.764, df = 118, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 4000
95 percent confidence interval:
5001.403 5183.471
sample estimates:
mean of x
5092.437
- Make statistical conclusion
p < 0.05, so we reject the null hypothesis
- Write a scientific (biological) conclusion
penguins do not weigh the same as Stevie.On average, gentoo penguins weigh 5092.437g, so they are heavier.
Attribution
This lab was developed using resources that are available under a Creative Commons Attribution 4.0 International license, made available on the SOLES Open Educational Resources repository.